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Abstract 


Thurstone’s  Law  of  Comparative  Judgment  provides  a  method  to  convert  subjective  paired  compar¬ 
isons  into  one-dimensional  quality  scores.  Applications  include  judging  quality  of  different  image  recon¬ 
structions,  or  different  products,  or  different  web  search  results,  etc.  This  tutorial  covers  the  popular 
Thurstone-Mosteller  Case  V  model  and  the  Bradley- Terry  logistic  variant.  We  describe  three  approaches 
to  model-fitting:  standard  least-squares,  maximum  likelihood,  and  Bayesian  approaches.  This  tutorial 
assumes  basic  knowledge  of  random  variables  and  probability  distributions. 
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1  Why  Paired  Comparisons? 

When  comparing  different  options,  one  often  wishes  to  assign  a  single  quality  score  to  each  option.  For 
example,  you  may  want  to  score  the  quality  of  different  image  processing  algorithms,  or  the  skill  of  different 
chess  players,  or  the  seriousness  of  different  crimes.  To  illustrate,  we  place  three  images  of  an  apple  on  a 
quality  scale  in  Figure  1.  Here,  we  are  concerned  with  scores  on  an  interval  scale,  which  means  that  only  the 
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Figure  1:  Image  quality  scale 


relative  differences  between  scores  matters,  and  the  scores  could  be  shifted  or  multiplicatively  scaled  without 
changing  their  meaning.  An  interval  scale  has  two  degrees  of  freedom:  the  placement  of  zero,  and  the  unit  of 
measurement.  Fahrenheit  and  Celcius  temperature  scales  are  interval  scales,  because  the  difference  between 
two  temperatures  matters,  but  the  placement  of  0°  and  the  unit  of  “1  degree”  is  arbitrary.1 

How  do  you  gather  data  to  score  a  set  of  options?  It  is  tempting  to  simply  ask  a  bunch  of  people  to  score 
each  of  your  options:  “On  a  scale  of  1  to  10,  what  is  the  quality  of  this  image?”  However,  people  may  mean 
different  things  by  the  same  score  (a  3  to  one  person  may  be  different  than  a  3  for  another  person).  It  may 
be  hard  to  determine  specifically  what  “1”  and  “10”  mean  (how  bad  does  an  image  have  to  be  to  get  a  1?). 
It  may  be  inflexible  (what  if  you  want  to  give  something  a  15?).  Further,  you  may  care  more  about  scoring 
the  options  in  the  context  of  the  set,  rather  than  on  an  absolute  scale  (you  want  to  know  “How  much  better 
does  this  image  look  than  the  other  options?”  rather  than  “Does  this  image  look  good?”).  Because  of  these 
issues,  gathering  paired  comparisons  may  be  more  useful  than  directly  asking  for  quality  scores. 

In  a  paired  comparison  experiment,  you  ask,  “Is  A  better  than  B?”  Generally  ties  are  not  allowed  (or 
they  may  be  counted  as  half  a  vote  for  each  option).  Ideally  you  would  get  comparisons  for  all  possible 
pairs  of  options  you  are  judging,  but  this  is  not  necessary  to  estimate  the  scores,  and  for  a  large  number  of 
options,  may  simply  be  infeasible.  There  may  also  be  issues  of  order  presentation  (which  option  is  presented 
first  could  affect  the  preference)  but  in  the  rest  of  this  tutorial  we  assume  that  this  issue  can  be  ignored. 

The  result  of  a  paired  comparison  experiment  is  a  count  matrix,  C ,  of  the  number  of  times  that  each 
option  was  preferred  over  each  other  option. 


Ci, 


#  times  option  i  preferred  over  option  j  if  i  ^  j 
0  if  *  =  j. 


1.1  Roadmap 

Now  that  we  have  defined  the  problem  and  the  experimental  count  data,  in  the  next  Section  we  describe 
Thurstone’s  statistical  model  of  judgments  and  Thurstone’s  Law  of  Comparative  Judgment,  which  provide 
a  method  of  estimating  the  quality  score  difference  for  two  options.  We  will  then  extend  the  analysis  to 
estimating  scores  for  more  than  two  options  using  a  least  squares  method  (Section  3),  a  maximum  likelihood 
method  (Section  4),  and  using  the  expected  value  of  the  score  (Section  5).  We  illustrate  the  different 
approaches  with  simulations  (Section  6).  This  document  ends  with  Matlab  code  to  implement  the  described 
functions. 

1An  alternate  is  a  ratio  scale ,  where  the  zero  value  is  fixed  e.g.  age,  where  0  years  is  a  specific  reference  point.  In  a  ratio 
scale  we  can  always  compare  to  0,  so  20  years  is  twice  as  old  as  10  years,  whereas  on  a  interval  scale  20°  is  not  twice  as  hot  as 
10°.  The  classification  of  measurement  scales  is  discussed  by  Stevens  [10]. 
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2  Models  for  Comparative  Judgment 

There  are  two  common  models  for  analyzing  paired  comparison  data.  We  first  discuss  Thurstone’s  model, 
and  then  the  Bradley-Terry  model. 

2.1  Thurstone’s  Model 

In  1927,  Louis  Leon  Thurstone  pioneered  psychometrics  by  using  Gaussian  distributions  to  analyze  paired 
comparisons  [11,  7].  Thurstone’s  model  assumes  that  an  option’s  quality  is  a  Gaussian  random  variable. 


Probability  Densities  for  A  and  B 


Figure  2:  PDFs  of  A  and  B 


Consider  the  basic  case  of  two  options,  where  we  let  the  Gaussian  random  variables  A  and  B  represent  the 
quality  of  option  A  and  option  B  respectively: 

A~  Af{nA,cr2A),  B  ~ 

P(A  =  a)  =  A-  0(2^l)  ,  P(B  =  b)  =  A_.  , 

where  </>  is  the  standard  normal  PDF  (zero  mean,  unit  variance), 


4>{x) 


1 

V2tt 


1 
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Thurstone’s  model  says  that  when  a  person  judges  whether  A  is  better  than  B,  they  draw  a  realization  from 
A’s  quality  distribution  and  a  realization  from  B’s  quality  distribution,  and  then  choose  the  option  with  the 
higher  quality.  Equivalently,  they  choose  option  A  over  option  B  if  their  draw  from  the  random  variable 
A  —  B  is  greater  than  zero, 

P(A  >  B)=  P(A-B  >  0). 

Since  A  —  B  is  the  difference  of  two  Gaussians,  A  —  B  is  a  Gaussian  random  variable: 

A  —  B  ~  J\f(p,  a) 

Pab  =  H  A  —  Hb 

(j\b  =  aA  A  ~  ‘ZpABV A&Bi  (1) 


where  pAB  is  the  correlation  between  A  and  B. 
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PDF  of  A-B 


Figure  3:  P(A  >  B )  is  the  shaded  area  under  the  PDF  of  A  —  B. 


Therefore  the  probability  of  choosing  option  A  over  option  B  (shown  in  Figure  3)  is 


P(A  >  B)  =  P(A  -  B  >  0) 


/  0  sf^^AB 


g  ( x  Vab)  / ab) 


/  1  e-x  /{^AB)dx 

l-HAB  V2™~AB 
C^ab  0  _ 

1  e~x  /^)dx 

t  —  oo  v  J  ab 
r^AB 


[ 

J- oo  V2™ 

/Hab 

A—  cj)(-s—)dx 

& AB  ^  v  &AB  ' 

-OO 


>  — OO 
MAB 
&AB 


<j>(t)dt 


=  $ 


d’AB 
&  AB 


where  <f>(z)  is  the  standard  normal  cumulative  distribution  function  (CDF) 

$(2)  =  L_  f  e~ t  !2  dt  —  f  (j){t)  dt. 

v  2tt  J —00  J  —00 

By  inverting  (2),  we  can  calculate  the  quality  difference  / lab  as 

Hab  =  &ab  (P(A  >  B)) , 


(2) 


where  $_1  (x)  is  the  inverse  CDF  of  the  standard  normal.  The  inverse  CDF  of  the  standard  normal  is  also 
commonly  known  as  the  z-score  or  standard  score  since  it  gives  the  number  of  standard  deviations  that  x 
is  from  the  mean.  Although  traditionally  getting  the  z-score  required  lookup  tables,  modern  computers  can 
calculate  the  inverse  CDF  function  precisely. 

Thurstone  proposed  estimating  P(A  >  B)  by  the  empirical  proportion  of  people  preferring  A  over  B, 
C a,b / (C a,b  +  Cb,a)-  So,  the  estimator  for  the  quality  difference  (iab  is 


dAB  =  cab  $ 


Ca,b 

Ca,b  +  Cb.a 


(3) 

(4) 


The  estimate  (3)  is  known  as  Thurstone’s  Law  of  Comparative  Judgment. 
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2.2  Thurstone’s  Case  V  Model 


Thurstone  made  a  number  of  model  simplifications  for  tractability.  The  estimate  (3)  requires  <tab  to  be 
known  or  estimated,  and  in  the  general  model  that  requires  estimating  the  variances  and  correlation  of  A 
and  B  as  per  (1).  The  simplest  and  most  popular  simplification  is  the  Case  V  model,  which  assumes  that 
all  options  have  equal  variance  and  zero  correlations  (or  less  restrictively,  equal  correlations  instead  of  zero 
correlations  [9]): 

2  _  2 
aA  ~  aB 

PAB  =  0. 


Without  loss  of  generality2,  set  the  variances  to  one  half  a\  =  \  so  the  variance  of  A  —  B  is  one, 

2  _  2  2  _  i 

aAB  ~  aA  +  aB  ~  !• 


This  sets  the  scale  unit  for  the  interval  scale.  This  simplifies  Thurstone’s  Law  given  in  (3)  for  Case  V  to 


A  AB 


$ 


-l 


Ca 


Ca,b  +  Cb,a 


(5) 


For  the  rest  of  this  tutorial,  we  will  use  these  Case  V  assumptions  and  refer  to  this  Case  V  model  as 
Thurstone ’s  model. 

Another  common  approach  is  to  assume  that  u\  =  cr^  =  1,  so  that  a  =  V 2,  and  Thurstone’s  law  for 
Case  V  carries  an  extra  constant  of  p  =  V‘2®~1(qa  )■  This  makes  the  equations  less  convenient,  but 

then  the  quality  scale  differences  are  easily  interpreted  as  the  number  of  standard  deviations.  If  you  want 
the  quality  scaled  in  terms  of  standard  deviations,  you  can  multiply  the  quality  scale  values  in  this  tutorial 
by  \/2  (which  is  ok  since  the  interval  scale  is  not  affected  by  multiplicatively  scaling  the  scores). 


2.3  Prior  Knowledge 

Prior  knowledge  is  easily  incorporated  into  the  model  by  adding  values  to  the  count  matrix  according  to 
what  you  believe  the  proportion  of  counts  should  be  a  priori.  Create  a  matrix  B  of  the  proportion  of  times 
you  believe  one  option  would  be  preferred  over  the  other.  Then  add  a  weighted  version  to  the  collected  data. 

C  =  C  +  aB. 

Using  a  prior  B  can  help  regularize  the  estimates  and  solve  the  0-1  problem,  as  we  discuss  in  Section  3.2. 


2.4  The  Bradley- Terry  model 


Bradley  and  Terry  [2]  introduced  an  alternate  model,  also  known  as  the  Bradley-Terry-Luce  model  (BTL) 
for  Luce’s  extension  to  multiple  variables  in  [6].  The  BTL  model  differs  from  the  Thurstone  model  in  that 
it  uses  Gumbel  random  variables  for  the  quality  of  each  option  instead  of  a  Gaussian.  Then  the  BTL  scale 
difference  A  —  B  is  a  logistic  random  variable,  and  so  P(A  >  B)  can  be  calculated  from  the  logistic  cumulative 
distribution  function.  This  model  has  a  closed-form  solution: 


P(A  >  B)=  P(A  —  B  >  0) 


exp  {pa/s) 

exp(pA/s)  -(-  exp (hb/s) 


1  .  1  ,  i  ,Pa~  PBs. 
_  +  _tanh(__); 


(6) 

(7) 


where  s  is  the  scale  parameter  for  the  logistic  distribution  (changing  s  changes  the  variance,  but  not  the 
mean).  Equation  (6)  says  that  the  probability.  Then  by  again  estimating  P{A  >  B)  by  the  empirical  count 
proportion  c, -  and  inverting  (7)  yields  the  estimate 


PA  -  Pb 


In  (  - 


CA,. 


\Ca,b  +  Cb,a 


-In  1- 


CA. 


Ca.b  +  Cb,a 


(8) 
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The  inverse  logistic  CDF  in  (8)  is  commonly  called  the  logit.  To  compare  the  BTL  model  scale  differences  with 
the  Thurstone  model  ones  from  (3),  equate  the  variance  by  setting  s  =  — .  Empirically  as  shown  in  Figure 
4,  the  logistic  CDF  is  very  similar  to  the  Gaussian  CDF,  so  that  using  Thurstone-Mosteller’s  solution  or 
BTL’s  model  produces  very  similar  results.  Some  people  prefer  BTL  for  computational  simplicity  (you  don’t 
have  to  compute  the  inverse  Gaussian  CDF),  although  with  modern  computers  and  algorithms,  computing 
the  inverse  Gaussian  CDF  is  simple,  so  the  computational  aspect  is  not  an  issue. 


Figure  4:  Gaussian  vs  Logistic  CDF 

The  logistic  CDF  has  a  fatter  tail  and  is  slightly  more  sloped  at  the  inflection  point  than  a  Gaussian  with 
the  same  mean  and  variance.  This  means  that  the  BTL  model  will  estimate  slightly  smaller  scale  differences 
for  proportions  near  |  and  slightly  larger  scale  differences  for  proportions  near  0  or  1  when  compared  with 
Thurstone’s  model. 


3  Model  Fitting 

Thurstone’s  model  provides  a  method  of  estimating  the  scale  difference  for  any  single  pair  of  options  by 
estimating  P(A  >  B)  by  the  empirical  proportion  of  people  preferring  A  to  B.  However,  when  considering 
more  than  two  options  this  approach  breaks  down  because  these  values  need  to  be  massaged  to  fit  on  a  one 
dimensional  scale.  In  this  section  we  detail  three  different  approaches  to  estimating  the  quality  scores  given 
more  than  two  options  for  the  Thurstone  model  (the  same  approaches  can  be  applied  to  the  BTL  model). 

3.1  Thurstone-Mosteller  Least  Squares  Method 

To  determine  the  quality  scores  for  a  set  of  m  options,  Thurstone  offered  a  solution,  which  Mosteller  later 
showed  was  the  solution  to  a  least  squares  optimization  problem  [9] .  Define  the  vector  of  quality  scores 
/it  =  [hi,  p-2, ...,  tim\i  and  let  D  be  an  m  x  m  matrix  where  Di j  =  d*”1  yc  C+c.  ^  is  the  (Case  V)  Thurstone’s 

Law  estimate  (5)  for  the  quality  difference  between  option  i  and  option  j.  (You  may  also  use  the  BTL  model 
by  forming  D  using  the  logit  from  (8).)  The  least  squares  estimate  for  the  quality  scores  /r  minimizes  the 
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squared  error  between  the  quality  scores  and  the  Thurstone’s  Law  pairwise  estimates: 


A  =  arS  m|n  -  (m»  -  A?))2 

ij 


(9) 


This  least  squares  problem  has  a  simple  closed-form  solution  which  can  be  derived  from  the  D  matrix. 
If  we  set  fi i=0,  the  least  squares  solution  is 


J  ^  jji  <  ^  m 


m 


i—  1  z=l 

If  instead  of  assuming  [ii  =  0,  you  prefer  to  assume  that  the  mean  of  all  the  /q  is  zero,  the  solution  is 

m  „ 


(10) 


(11) 


3.2  The  0/1  Problem 

When  estimating  quality  differences  by  this  least  squares  method,  we  will  have  problems  estimating  when 
the  proportion  C/+cb  a  *s  9  or  b  since  4>_1(0)  =  — oo  and  4>_1(1)  =  oo. 

There  are  a  couple  of  standard  ways  to  deal  with  this  problem  [3],  and  we  note  a  third  optimal  solution. 
One  solution  to  the  0/1  problem  is  to  simply  ignore  the  0/1  entries  and  use  an  incomplete  matrix  solution 
[8,  4].  We  argue  this  is  too  heavy-handed  a  fix  in  that  it  ignores  important  information  that  the  one  option 
is  strongly  preferred  to  the  other  option. 

Another  standard  solution  is  to  “fix”  the  0/1  proportions  by  modifying  the  entries  in  the  count  data 
matrix  to  change  the  0/1  proportions  to  be  -  and  1  —  — ,  where  n  is  the  number  of  people  surveyed  for  that 
comparison.  This  is  equivalent  to  modifying  the  count  data: 

r  §  if  -  o 

Cij  =  <  Cij  —  \  if  Cij  =  n  (or  equivalently  if  Cji  =  0)  (12) 

I  Cij  otherwise 

We  will  refer  to  this  modified  data  matrix  as  the  0/1  fixed  data.  A  related  solution  is  to  add  a  count  or 
fractional  count  to  both  the  0  and  1  entries  of  the  count  matrix;  this  is  equivalent  to  assuming  some  prior 
data  (see  Section  2.3).  These  fixes  do  change  the  count  matrix,  but  in  a  conservative  way  that  biases  the 
counts  toward  less  confidence,  and  this  fix  is  not  as  big  a  change  as  simply  ignoring  0/1  entries.  If  one 
employs  this  fix,  we  recommend  adding  1/2  count  to  adequately  fix  the  0/1  problem  without  adding  too 
much  noise  to  the  data,  though  we  note  there  is  no  general  optimal  value. 

We  note  a  third  solution  if  there  are  more  than  two  options:  estimate  the  means  by  the  maximum 
likelihood  estimate,  which  we  detail  in  Section  4.2.  This  is  the  solution  we  recommend  because  it  is  an 
optimal  approach  to  the  estimation  and  does  not  require  adding  noise  or  ignoring  data. 


4  Maximum  Likelihood  Scale  Values 

First,  we  show  that  if  there  are  just  two  options,  the  maximum  likelihood  estimate  of  the  quality  score 
difference  is  given  by  Thurstone’s  law  (3).  Then  we  give  the  maximum  likelihood  estimate  for  multiple 
options  (which  is  not  equivalent  to  the  least-squares  solution  given  above). 
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4.1  Maximum  Likelihood  for  Two  Options 

Let  the  paired  judgment  data  for  two  options,  A  and  B,  be  a  =  Ca,b  and  b  =  Cb,a-  The  likelihood  of  the 
quality  difference,  [Iab  =  Ha  ~  Mb,  is 

L(mab)  =  P(a ,  5|mab)  =  —  P(A  >  B)aP(B  >  A)b  (13) 

7 

=  -  <h(MAB)a<l)(— Mab)6, 

7 

where  7  is  a  normalization  constant  and  we  have  used  the  identity  P{B  >  A)  =  1  —  4 ’(mab)  =  4)(— mab)- 
The  maximum  likelihood  quality  difference  is 


Mab  =  argmax  L(mab) 

Mas 

=  argmax  (^(^b))0  (^(-mab))6 

MAS 

=  argmax  a  log  ($(mab))  +  blog  ($(-/zab))  • 

fJ’AB 


This  may  be  solved  by  Lagrange  multipliers 

0  = 


4>(hab)  ~  -rr— - r</>(-MAs) 


$(m ab ) 


$(— Mab) 
b 


$(Mab)  1  —  4>(mab) 


Mab  =  4> 


a  +  b 


=  4>” 


Ca,i 


Ca.b  +  Cb.a 


which  verifies  that  Thurstone’s  Law  yields  the  maximum  likelihood  solution  if  there  are  only  m  =  2  options. 


4.2  Maximum  Likelihood  for  Multiple  Options 

Extending  the  two  option  maximum  likelihood  estimation  described  in  Section  4,  to  a  comparison  of  m 
options,  there  is  no  longer  a  closed-form  solution.  Instead,  one  must  solve  a  optimization  problem. 

Let  /it  be  the  vector  of  quality  scale  values  /i  =  [mi,M2 ,-,/im]-  Define  the  log-likelihood  of  our  count 
data,  C,  as 

£(m|C)  ^  ^  loS (*(Mi  ^  Mi))-  (14) 

hi 

To  find  the  maximum  likelihood  solution  quality  scale  values,  one  must  solve 

arg  max 
A* 

subject  to 

To  find  a  unique  solution,  include  the  constraint  that  the  mean  of  all  the  quality  scale  values  is  zero  as  in 
(15),  or  set  one  of  the  quality  scale  values  to  zero  Hi  =  0. 

We  show  in  the  appendix  that  (15)  is  a  convex  optimization  problem. 


X]  loS($Oi  -  Mi)) 

X]  Mi  =  0. 


(15) 
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4.3  Maximum  A  Posteriori  Estimation 

One  can  also  form  the  maximum  a  posteriori  (MAP)  estimate,  by  including  a  prior  on  the  scale  values  p(/z): 

argmax  C{fi\C)  +  log  (p(/x)) 

subject  to  5>  =  °- 


If  there  is  little  information  about  the  true  scale  values  that  can  be  used  to  choose  a  prior,  then  we  suggest 
a  Gaussian  prior  that  assumes  the  different  scale  values  are  drawn  independently  and  identically  from  a 
standard  normal  will  reduce  the  estimation  variance  and  often  provide  better  estimates.  In  that  case  the 
MAP  estimate  solves: 


arg  max 


Ci>i  log($(Mi  -  Mi))  ^  f~2 

i,j  i 


(16) 


subject  to 


51  Mi  =  0. 


I 


This  choice  of  prior  acts  as  a  ridge  regularization  on  the  scale  values  [5],  and  is  still  a  convex  optimization. 


4.4  Advantages  of  Maximum  Likelihood  Estimation 

Maximum  likelihood  estimation  is  an  optimal  approach  to  estimation  problems  in  the  sense  that  it  produces 
the  solution  that  is  most  likely.  Additionally,  this  maximum  likelihood  solution  does  not  suffer  from  the 
0/1  problem  of  the  least  squares  methods  because  the  maximum  likelihood  method  does  not  use  the  inverse 
CDF.  The  0  entries  in  the  count  matrix  do  not  contribute  to  the  likelihood  and  pi  are  constrained  by  the 
other  terms  in  the  log-likelihood  to  keep  them  from  being  driven  to  oo. 


5  Expected  Quality  Scale  Difference 

Instead  of  using  the  maximum  likelihood  quality  difference,  one  can  estimate  the  quality  difference  to  be 
the  expected  quality  difference  where  the  expectation  is  taken  with  respect  to  the  likelihood  (or  with  respect 
to  the  posterior  mean  if  there  is  a  prior).  On  average,  we  expect  this  solution  to  perform  better  since  this 
approach  uses  the  full  likelihood  information  rather  than  just  the  maximum  of  the  likelihood. 

5.1  Expected  Quality  Estimate 

Consider  the  two  option  case.  Treat  the  unknown  quality  score  difference  as  a  random  variable  U .  The 
likelihood  P(a ,  b\U  =  u)  is  the  probability  of  observing  a  people  preferring  option  A,  and  b  people  preferring 
option  B,  as  given  in  (13). 

P(a,  b\U  =  u)  =  (^  P(A  >  B\U  =  u)a  P(B  >  A\U  =  u)b. 

Using  Bayes  rule,  the  posterior  can  be  written  in  terms  of  the  likelihood  as 

P(U  =  „|a,  b)  =  p{aM  c  =  “Wt/ =  «■) 

P{a,b) 


P(U  =  u\a,  b) 


1  P(A  >  B\U  =  u)a  P(B  >  A\U  =  u)b  P{U  =  u ) 
7 

-$(w)a(l-$(zt))hP([/  =  u), 

7 
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where  7  is  a  normalizer  constant,3 


/»00 

7=  /  <S>(u)a(l-$(u))bP(U  =  u)du. 

J  OO 


If  we  assume  a  uniform  prior  for  P(U  =  u)  over  the  range  [—t,  t ],  then  the  expected  quality  scale  difference 


U  is 


1  f* 

E[U\a,  b\  =  —  u  4>(u)a(l  -  4>(u))b  du. 

2^7  J-t 


(17) 


Alternatively,  we  could  assume  a  Gaussian  prior  for  P(U  =  u).  Using  the  standard  normal,  the  expected 
quality  scale  difference  U  is 


1  f°° 

E[U\a ,  b]  =  -  u  4>(u)a(l  -  $(zt))b  <f>(u)dv 
7  J —00 

[\~\p)pa(i-p)bdp. 

7  Jo 


(18) 


5.2  Computation  of  Expected  Quality  Estimate 

Unfortunately,  no  closed  form  solution  exists,  so  the  integral  and  the  normalizer  constant  must  be  numerically 
computed.  Numerical  integration  is  slow,  and  may  be  prone  to  precision  errors  depending  on  the  method  of 
integration.  Matlab  may  be  used  to  attempt  to  approximate  the  integral  (trapz,  quad,  quadgk),  but  Matlab 
is  limited  to  machine  precision  (32  or  64  bits)  and  may  not  evaluate  the  integral  accurately  enough.  (Matlab 
may  return  0  when  the  actual  solution  should  be  a  very  small  non-zero  number).  Mathematica  has  more 
sophisticated  numerical  integration  routines  and  arbitrary  precision  calculations.  Maple  may  also  be  used. 


5.3  Bayesian  Estimation  of  Preference  Probability 

In  the  previous  section,  we  considered  the  quality  difference,  U,  to  be  a  random  variable,  and  estimated  it 
by  taking  its  expectation.  In  this  section,  we  instead  consider  the  probability  that  option  i  is  chosen  over 
option  j  to  be  a  random  variable,  Xjj .  We  consider  the  result  of  performing  Bayesian  estimation,  and  the 
relation  to  priors  and  smoothing. 

The  count  data  is  generated  from  a  binomial  distribution  where  the  parameter  aq  j ,  is  an  observation  of 

Xid: 


?  (J j-i ^  Binomf CJj  j  T  Cj^i  \  x i  j ) 

OC  XijCiJ(  1  -  Xij)0*'* 


After  observing  the  data  C  and  assuming  a  uniform  prior  probability  on  Xij ,  the  posterior  probability  of 
Xij  has  a  beta  distribution  with  parameters  (7+1: 

l^++  Q?,*)  ^  1 (1  a+j)  i'1 

~  Beta(x*i  j \Ci.j  +  1,  C j,i  +  1).  (19) 


The  maximum  a  posteriori  estimate  of  Xij  is  Xij  = 


c, 


(the  mode  of  the  beta  distribution  (19)). 


Cij  +  Cji% 

Then  calculate  the  quality  scale  difference  by  setting  xl  j  =  4>(u.j  —  Uj)  and  inverting,  which  results  in 
Thurstone’s  law: 


Ui  —  Uj  =  ' I> 


-1 


Ci 


Cij  +  Cj.i 


3Although  at  first  glance  P(u\a,  b)  looks  similar  to  a  beta  distribution,  it  is  parameterized  by  u,  instead  of  p  =  <!’(+),  so  it 
is  not  the  same.  The  normalizer  must  be  calculated  numerically. 


UWEETR-2011-0004 


10 


1  trial,  50  people,  5  options 
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Figure  5:  Results  of  ofe  tej 
estimates  from  least-squares 
timate  assuming  the/TThtfitP' 
model  (“LS  (BTL  model)”). 
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comparing  five  options  with  true  qualities  (—.5,  —.2^,  0,  .25,  .5).  Plot  compares 
estimate  assuming  the  Thurstone  model  (“LS”),  the  maximum  likelihood  es- 

(L)eas t  -sq u a resQ'^i mate  assuming  the  BTL 


A 


A  A 


b^y©S  fflF  '  ',s  consider  estimating  §ifj  by  its  me§n#E’[Jf,: fJ]  = 


jf  the  quality;  scale  difference  is 


C, 


Cij  +  Cjti  +  2 


„-£>$ = *-o(2a%±im ) .  0.25 


Cij  +  Cji  +  2 


0.5 


This  result  is  equivalent  to  the  Thurstone’s  law  estimate  if  one  puts  a  prior  of  1  on  all  the  counts,  meaning 
that  a  priori  you  believe  that  all  of  the  choices  are  possible.  This  may  also  be  interpreted  as  Laplace 
smoothing  the  count  data. 


6  Illustrative  Experiments 

We  illustrate  the  different  estimation  approaches  with  some  simulations.  We  make  n  quality  observations 
for  each  of  pair  of  to  options  (simulating  surveying  n  people  for  their  preferences  about  all  possible  pairs  of 
to  options).  We  first  generate  the  true  means  for  the  to  quality  score  distributions.  Then,  for 

each  pairwise  comparison  for  each  person,  the  perceived  quality  scores  for  the  ith  option  are  drawn  IID  from 
=  |)  as  in  Thurstone’s  Case  V.  The  count  data  is  collected,  and  /t  is  estimated  from  the  data. 

Fig.  5  illustrates  the  results  of  one  run  of  the  simulation,  with  n  =  50  people  surveyed  about  all  pairs 
of  to  =  5  options.  The  true  mean  values  are  marked  on  top,  and  are  at  (— .5,  —  .25, 0,  .25,  .5).  The  mean 
quality  estimates  are  shown  on  the  next  rows  for  the  least-squares  estimate  assuming  the  Thurstone  model, 
the  maximum  likelihood  estimate  assuming  the  Thurstone  model,  and  the  least-squares  estimate  assuming 
the  BTL  model. 

Fig.  6  shows  results  for  a  single  pair  of  options  (to  =  2),  averaged  over  10,000  runs  of  the  simulation  for 
varying  true  quality-differences.  If  the  true  quality  difference  (shown  on  the  x-axis)  is  large,  then  the  0/1 
problem  occurs  (see  Sec.  3.2),  and  the  Thurstone’s  Law  estimate  given  by  (3)  is  that  the  quality  difference 
is  “infinite.”  For  this  case  of  to  =  2  options,  recall  that  (3)  is  also  the  maximum  likelihood  estimate.  The 
green  line  is  the  maximum  likelihood  estimate  given  a  prior  of  1  count  to  both  options  (this  could  also  be 
called  a  maximum  a  posterior  estimate).  This  is  always  well-defined  and  performs  better  than  Thurstone’s 
Law  for  all  quality  differences.  The  red  line  shows  the  mean  quality  estimate  as  given  by  (17).  This  is  a 
more  robust  estimate,  and  as  shown  in  Fig.  6,  will  perform  better  than  the  maximum  likelihood  with  prior 
when  the  true  quality  difference  is  large,  but  may  perform  worse  when  the  true  quality  difference  is  small. 

Fig.  7  shows  the  simulation  results  when  there  are  to  =  10  options.  These  results  were  averaged  over 
1000  runs  of  the  simulation.  For  each  run,  the  true  mean  quality  of  each  of  the  ten  options  was  chosen 
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Figure  6:  Result  of  10,000  trials  with  two  options.  In  each  trial  25  people  judge  the  paired  comparison.  The 
mean  quality  of  option  A  is  fixed  at  0,  and  the  mean  quality  of  option  B  is  varied  along  the  x  axis. 


uniformly  on  [—x,x]. 

Fig.  7  compares  the  Bradley- Terry  model  (labelled  “BTL”)  with  the  Thurstone  model  (all  results  not 
labelled  “BTL”).  We  also  compare  the  different  methods  of  solving  the  0/1  problem  (Section  3.2):  the  least 
squares  methods  (“LS”)  where  any  0/1  proportions  were“fixed”  according  to  (12)  (denoted  by  “0/1  fix”), 
Morrissey  and  Gulliksen’s  incomplete  matrix  solution  [8,  4]  (denoted  by  “incomplete”),  maximum  likelihood 
(“ML”),  and  the  maximum  a  posteriori  estimate  where  the  prior  is  independently  and  identically  a  standard 
normal  on  each  of  the  quality  scores  (“MAP”,  from  (16)).  We  show  two  different  metrics  in  Fig.  7: 
Interval  Mean  Squared  Error:  the  average  squared  error  in  the  quality  scale  difference  for  each  option 
pair. 


Si,j  =  Hi  —  Hj  =  ground  truth  quality  difference 

S*j  =  H*i  ~  H*j  =  estimated  quality  difference 

(St  j  -  S'/,)2 
Interval  MSE  =  V  — ^ 

'  m(m  —  1) 

Probability  Mean  Squared  Error:  the  average  squared  error  in  the  estimated  choice  probability  for  each 
option  pair. 


Pij  =  P(Qi  >  Qj )  =  $(/ij  —  Hj)  =  ground  truth  choice  probability 
P*j  =  ${Hi  —  Hj)  =  estimated  choice  probability 
(p.  ■  —  p*  )2 

Choice  Probability  MSE  =  - . 

m(m  —  1) 

When  the  true  qualities  are  close  together,  the  BTL  logistic  model  performs  slightly  better  than  the 
Thurstone-Mosteller  Gaussian  model  because  the  logistic  CDF  has  a  steeper  slope  when  the  probability  is 
|  so  it  estimates  slightly  lower  values.  The  least  squares  methods  perform  worse  as  the  true  means  become 
more  separated,  but  the  maximum  likelihood  methods  perform  better. 
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1000  trials,  25  people,  10  options,  true  quality:  Unif(-x,x) 

Choice  Probability  MSE  Interval  MSE 


Figure  7:  Result  of  1,000  trials  comparing  ten  options.  In  each  trial  25  people  judge  each  paired  comparison. 

7  Summary 

Fitting  Thurstone’s  model  or  the  BTL  model  to  paired  comparison  data  can  be  a  useful  tool  to  analyze  the 
relative  qualities  of  a  set  of  options.  Various  estimation  methods  can  be  used  to  fit  each  model.  If  the  true 
quality  differences  are  believed  to  be  separated  by  at  least  a  standard  deviation,  then  we  suggest  using  the 
maximum  a  posteriori  estimate  is  advantageous  because  we  have  seen  it  consistently  produce  good  results, 
it  is  an  optimal  approach  to  estimation,  the  data  does  not  need  to  be  modified  to  avoid  0/1  problems,  and 
it  can  be  solved  efficiently  as  a  convex  optimization  problem. 
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8  Appendix 

The  maximum  likelihood  solution  (15)  is  equivalent  to  solving  for  the  =  Hi  —  Hj  that  solves 

Ci’i  lQg($(^i)) 

(J-ij  +  Vjk  =  Hik  V i,j,  k  G  {1, m} 

The  likelihood  )C?:  j  Cjj  log($(/tZjj))  is  concave  if  <f>  is  a  log-concave  function,  since  log(<&(a;))  would  be  concave 
and  concavity  is  preserved  under  addition  and  positive  scaling. 

Definition.  Log-concave.  A  function  f  :  R™  — >  R  is  log-concave  if  f(x)  >  0  and  g(x)  =  log iff))  is  concave 
(or  equivalently  if  h(x)  =  —  log (f{x))  is  convex). 

Proof.  Use  the  fact  that  log-concavity  is  closed  under  convolution.  Since  the  Gaussian  PDF  is  log-concave 
(the  second  derivative  of  its  log  is  log^(a:)  =  —1),  and  the  CDF  is  the  convolution  of  the  Gaussian  PDF 
with  the  unit  step  function  (which  is  also  log-concave).  □ 

The  concavity  of  the  Gaussian  CDF  may  alternatively  be  proved  because  the  CDF  of  a  differentiable 
log-concave  probability  density  is  log-concave.  The  Gaussian  is  a  differentiable  log-concave  density  function, 
so  the  CDF  of  a  Gaussian  is  log-concave. 

Corollary  1.  /( x)  is  log-concave  iff  f(x)  >  0  and  ( f'(x ))2  >  f"{x)f{x) 

This  follows  from  the  condition  that  h(x)  =  —  \og(f(x))  is  convex  iff  the  second  derivative  of  h(x)  is 
greater  than  or  equal  to  zero. 

Lemma  1.  The  cumulative  distribution  function  of  a  log-concave  differentiable  probability  density  is  log- 
concaveA 

Proof.  Let  g{t)  =  exp (—h[t))  be  a  differentiable  log-concave  probability  density  function  and  let  the  cumu¬ 
lative  distribution  be 

fix)  =  f  git)  dt=  j  e~h{t)  dt. 

J— oo  J  —  oo 

We  prove  that  /  is  log-concave  by  showing  that  /  satisfies  corollary  1. 

The  derivatives  are 


arg  max 

V«3 

subject  to 


fix)  =  g(x)  =  e~hW 

fix)  =  g’{x)  =  - h'(x)e~h{x)  =  -h\x)g(x). 

For  h'(x)  >  0,  since  f(x)  >  0  and  g(x)  >  0, 

fix)  fix)  =  —f(x)h!  (x)g(x)  <  0 
if(x))2  =  gix)2  >  0. 


Therefore,  Corollary  1  holds. 

To  show  Corollary  1  holds  for  h'(x)  <  0,  we  note  that  since  h  is  convex, 

h(t)  >  h(x)  +  h'(x)(t  —  x). 

4This  lemma  appears  in  Boyd  and  Vandenberghe’s  Convex  Optimization  [1],  as  exercise  3.55. 
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Taking  the  negative,  exponent  and  integrating  both  sides, 


s ~hit)dt<  /  e-h(x)-h'(x)(t-x)  dt 

J  —  OO 

/  f1 * * * * * * * * X  / 

—  e~h(x)+xh  (x)  /  e~th  (®) 


_ g  —  h(x)-\-xh!  (x) 


—xh'(x) 


—h'(x) 


e-h(x) 

—h'{x) 


Multiplying  both  sides  by  —h'{x)e  h(-x\ 

- h\x)e~h(x)  f  e~Ht)  dt  <  e~2h{x) 

J  —  OO 


f"{x)f{x)  <  ( f{x)Y . 


9  Code 


□ 


1  function  q  =  thurstone (a, b) 

2  %  This  function  returns  the  estimate  in  the  difference  of  the  means 

3  %  according  to  Thurstone' s  law  of  comparative  judgment  (Case  V) . 

4  %  Each  quality  distribution  is  assumed  to  have  variance  =  1/2 

5  %  so  that  the  variance  of  the  difference  of  any  two  quality 

6  %  random  variables  is  1 . 

7  % 

8  %  Usage: 

9  %  q  =  thurstone (a, b) 

10  %  Input  integers  a  and  b  representing  the  number  of  times 

n  %  object  a  or  b  was  preferred. 

12  %  Example:  q  =  thurstone(2,  3); 

13  % 

14  %  q  =  thurstone  (v) 

15  %  Input  vector  v  is  the  counts  for  the  number  of  times  that 

16  %  v(l)  or  v(2)  was  preferred.  Only  supported  for  vectors  of  length  2. 

17  %  Example:  q  =  thurstone ( [2  3]); 

is  %  This  gives  the  same  result  as  the  example  for  thurstone {2,  3) 

19  %  above . 

20  % 

21  %  q  =  thurstone  (p) 

22  %  Input  can  also  be  a  scalar,  p  for  the  proportion  of  times 

23  %  that  Cl  was  chosen  over  C2 . 

24  %  Example:  q  =  thurstone  ( 0 . 4 ) ; 

25  %  This  gives  the  same  result  as  the  example  for  vector  C  above. 

26  % 

27  %  q  is  the  estimated  difference  in  the  means  of  C(l)  —  C(2) 

28  % 

29  %  S  =  thurstone  (C) 

30  %  C  is  a  square  matrix  of  counts,  where 

31  %  C(i,j)  =  #  times  option  i  was  preferred  over  option  j 

32  %  and  it  is  assumed  that 

33  %  C ( i , j )  +  C  ( j , i )  =  number  of  times  that  option  i  was  judged  against  j. 

34  %  Note  that  diag(q)  =  0  since  comparing  any  2  same  options 

35  %  should  result  in  a  50—50%  chance  of  chosing  each  option. 

36  %  Example:  S  =  thurstone ([0  3;  2  0]); 
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56 

57 

58 

59 

60 
61 
62 
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64 

65 

66 

67 

68 

69 

70 

71 

72 

73 

74 

75 

76 
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78 

79 

80 
81 
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84 
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88 

89 

90 

91 

92 

93 

94 

95 

96 

97 

98 

99 
100 
101 


%  S(l,2)  should  be  equal  to  the  above  examples. 

%  S(2,l)  should  be  equal  to  the  negative  of  the  above  examples. 
%  Generally  S  ==  —S' . 


%  Kristi  Tsukida  <kristi . tsukida@gmail . com> 
%  Created  Jan  11,  2010 
%  Last  Updated  Mar  3,  2010 
%  added  matrix  input 


%  Input  checks 
if  nargin  >  1 

assert ( isnumeric (a) ,  'a  must  be  numeric'); 
assert ( isnumeric (b) ,  'b  must  be  numeric'); 
assert (isscalar (a) ,  'a  must  be  a  scalar'); 
assert (isscalar (b) ,  'b  must  be  a  scalar'); 
assert  (-tisnan  (a)  ,  'a  cannot  be  NaN'); 
assert (-dsnan (b) ,  'b  cannot  be  NaN'); 
assert (a>0,  'a  must  be  positive'); 
assert (b>0,  'b  must  be  positive'); 

p=a/ (a+b) ; 
elseif  isscalar (a) 
p=a; 

assert ( isnumeric (p) ,  'p  must  be  numeric'); 
assert (-dsnan (p) ,  'p  cannot  be  NaN'); 
assert (p>0,  'p  must  be  positive'); 
assert (p<l,  'p  must  be  less  than  1'); 
elseif  isvector(a) 
v=a; 

%assert ( length (v) ==2 ,  'function  not  defined  for  vector  v  longer  than  2'); 
assert ( isnumeric (v) ,  'v  must  be  numeric'); 
assert (all (v>0) ,  'counts  in  v  must  be  positive'); 
assert ( sum (v) >0,  'counts  in  v  must  have  positive  sum'); 

%  vector  v 
if (length (v) ==2) 

p  =  v  (1)  .  /  sum  (v)  ; 

else 

assert (all (v<l ) ,  'probabilities  in  v  must  be  less  than  1'); 

P  =  v; 

end 


else  %  matrix  input 
C=a; 

assert ( isnumeric (C) ,  'C  must  be  numeric'); 
assert (all  (C (:) >0 ) ,  'counts  in  C  must  be  positive'); 
assert ( sum (C (:)) >0,  'counts  in  C  must  have  positive  sum'); 
assert (size (C, 1) ==size  (C, 2) ,  'C  must  be  a  square  matrix'); 

N  =  C  +  C' ; 

%  matrix  of  probabilities 

%  (norminv  handles  matrices  of  probabilities) 
p  =  C  ./  N; 

%  fix  p  so  that  q  ends  up  with  zeros  on  the  diagonal, 
p  (speye  (size  (p)  )>0)  =  0.5; 

end 

%  Assuming  independence,  and  each  quality  RV  has  variance  =  1/2, 
%  the  variance  of  Cl  —  C2  is  1 
sigma  =  1; 

%  Calculate  q 
q  =  norminv (p,  0,  sigma); 

%  same  as  q  =  norminv (p,  0,  l)*sigma; 

%  norminv (p,  0,  1)  is  the  zscore  of  p 
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1 

function  S  =  scale.ls (counts,  threshold) 

2 

%  Use  the  least  squares  complete  matrix  solution 

3 

%  (Thurstone,  Mosteller)  to 

4 

%  scale  a  paired  comparison  experiment  using 

5 

%  Thurstone '  s  case  V  model  (assuming  sigma  *2  =  0.5  for 

each 

6 

%  quality's  distribution) 

8 

%  counts  is  a  n— by— n  matrix  where 

9 

%  counts (i,j)  =  #  people  who  prefer  option  i  over  option  j 

10 

%  S  is  a  length  n  vector  of  scale  values 

12 

if  nargin  <  2 

13 

threshold  =  2; 

14 

15 

end 

16 

[m, mm]  =  size  (counts) ; 

17 

18 

assert (m  ==  mm,  'counts  must  be  a  square  matrix'); 

19 

%  Empirical  probabilities 

20 

P  =  counts  ./  (counts  +  counts'); 

21 

22 

P(eye(m)>0)  =0.5;  %  Set  diagonals  to  have  probability 

0.5 

23 

Z  =  norminv (P); 

24 

S  =  sum (Z, 1 ) '  /  m; 

1  function  S  =  scale_gulliksen (counts,  threshold) 

2  %  Use  the  Morrisey— Gulliksen  incomplete  matrix  solution  to 

3  %  scale  a  paired  comparison  experiment  using 

4  %  Thurstone ' s  case  V  model  (assuming  sigma~2  =  0.5  for  each 

5  %  quality's  distribution) 

6  % 

7  %  (This  code  follows  Gulliksen' s  formulation  given  in  Engeldrum's 

8  %  Psychometric  Scaling  book) 

9  % 

10  %  counts  is  a  n— by— n  matrix  where 

n  %  counts (i,j)  =  #  people  who  prefer  option  i  over  option  j 

12  %  S  is  a  length  n  vector  of  scale  values 

13  %  Scale  values  are  set  up  to  have  mean  of  0 

14 

15  if  nargin  <  2 

16  %  default  threshold  on  scale  difference 

17  threshold  =  2; 
is  end 

19 

20  [m,  mm]  =  size  (counts) ; 

21  assert (m  ==  mm,  'counts  must  be  a  square  matrix'); 

22 

23  %  Empirical  probabilities 

24  P  =  counts  .  /  (counts  +  counts'); 

25  P(eye(m)>0)  =0.5;  %  Set  diagonals  to  have  probability  0.5 

26 

27  %  Thurstone ' s  law  estimates  of  each  pairwise  quality  difference 

28  %  (norminv  calculates  the  z— scores  or  z— value) 

29  %  !  entries  in  Z  could  end  up  NaN 

30  Z  =  norminv  (P); 

31 

32  M  =  double (abs (Z)  >  threshold);  %  1  where  |Z(i, j) |  >  2,  0  otherwise 

33  M(eye(m)>0)  =  m  —  sum(M);  %  set  the  diagonal  values 

34 

35  d  =  sum(Z,  2);  %  =  #  of  valid  comparisons  +  1 

36 

37  S  =  M  \  d;  %  =  inv  (M)  *  d; 
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1  function  S  =  scale_ml (counts,  threshold) 

2  %  Use  cvx  to  compute  maximum  likelihood  scale  values 

3 

4  if  nargin  <  2 

5  %  default  threshold  on  scale  difference 

6  threshold  =2; 

7  end 

8 

9  [m,  mm]  =  size  (counts) ; 

10  assert (m  ==  mm,  'counts  must  be  a  square  matrix'); 

11 

12  previous_quiet  =  cvx.quiet  ( 1 )  ; 

13  cvx_begin 

14  variables  S(m,  1)  t; 

15  U  =  repmat  (S,  1 ,  m)  ; 

16  a  =  U  —  U';  %  a  (i,  j )  =  S(i)  —  S(j) 

17 

18  minimize  (  t  )  ; 

19  subject  to 

20  —sum  (sum  (counts  .  *log_normcdf  (a)  )  )  <  t 

21  sum  ( S )  ==0 

22  cvx.end 

23  cvx.quiet  (previous_quiet )  ; 
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